“One of the key concerns of statistics is the drawing of conclusions from a set of observed data. These data will usually consist of a sample of certain elements of a population, and the objective will be to use the sample to draw conclusions about the entire population.”
In this lecture, we will learn about how to construct distributions of sample statistics (e.g., minimum, maximum, mean, median, proportion, standard deviation).
Central Limit Theorem:
For a random sample of size \(n\) from a population with mean \(\mu\) and standard deviation \(\sigma\), the sampling distribution of the sample mean, \(\overline{X}\), is approximately normal and has a mean of \(\mu\) and a standard deviation of \(\dfrac{\sigma}{\sqrt n}\).
\[E(\overline{X})=\mu, \mbox{ and }SD(\overline{X})=\dfrac{\sigma}{\sqrt n}.\]
“… practically speaking, no matter how nonnormal the underlying population distribution is, the sample mean of a sample size of at least 30 will be approximately normal.”
Suppose \(X\) and \(Y\) are independent normal random variables. The additive properties are
\[E[X + Y] = E[X] + E[Y],\] \[Var[X + Y] = Var[X] + Var[Y].\]
The constant multiple properties are \[E[\color{red}cX] = \color{red}cE[X],\] \[Var[\color{red}cX] = \color{red}{c^2} Var[X].\]
If the independent random variables \(X_1, X_2, \ldots, X_n\) are from the same population, whose mean is \(\mu\) and standard deviation is \(\sigma\), then
Therefore, \[E[\overline{X}] = E\left[\dfrac{X_1 + X_2 + \cdots + X_n}{n}\right] = \dfrac{E[X_1]+E[X_2] + \cdots + E[X_n]}{n}=\dfrac{n\mu}{n}=\mu.\]
\[Var[\overline{X}] = Var\left[\dfrac{X_1 + X_2 + \cdots + X_n}{n}\right] = \dfrac{Var[X_1]+Var[X_2] + \cdots + Var[X_n]}{n^2}=\dfrac{n\sigma^2}{n^2}=\dfrac{\sigma^2}{n}.\]
\[SD(\overline{X})=\sqrt{Var(\overline{X})}=\dfrac{\sigma}{\sqrt{n}}.\]
If 1 man is randomly selected, find the probability that his weight is less than 167 lb.
If 36 men are randomly selected, find the probability that their average weight is less than 167 lb.
If 1 man is randomly selected, find the probability that his weight is between 170 and 175 lb.
If 64 men are randomly selected, find the probability that their mean weight is between 170 and 175.
You are to design an elevator to safely hold 16 people. Find the maximum allowable weight if we want a 0.95 probability that this maximum will not be exceeded in the worst case when 16 randomly selected males are on it.
Suppose that the underlying population is large in relation to the sample size \(n\). If the proportion of individuals in the population with a certain characteristic is \(p\), then the sampling distribution of sample proportion (\(\hat{p}\)) is approximately normal, \[E[\hat{p}] = p, \quad\mbox{ and }\quad SD[\hat{p}] = \sqrt{\dfrac{p(1-p)}{n}}.\]
The data file ATL_Departure_Flights_2017.csv has the flights status information (on-time or delayed) of all the domestic departure flights in Atlanta Hartsfield-Jackson Airport 2017.
How large is the dataset?
What percentage of departure flights were on-time in 2017?
Take a random sample of 50 flights. What percentage of departure flights were on-time in the sample?
Repeatedly take 30 or more samples and create a distribution graph.
To draw 2000 samples and create a simulated distribution graph.
n <- 50
pile <- rep(0, 2000)
for (i in 1:length(pile)) {
x <- departure$Status[sample(364655, n)]
phat <- table(x) / n
pile[i] <- as.numeric(phat[2])
}
stripchart(pile, method = 'stack', pch = 19,
at = 0.15, offset = 0.02, xlim = c(0, 1),
main = "Sampling Distribution of Sample Proportion (n=50)",
xlab = "Proportion of on-time departure flights")